Learning Concept Hierarchies from Text with a Guided Hierarchical Clustering Algorithm

نویسندگان

  • Philipp Cimiano
  • Steffen Staab
چکیده

We present an approach for the automatic induction of concept hierarchies from text collections. We propose a novel guided agglomerative hierarchical clustering algorithm exploiting a hypernym oracle to drive the clustering process. By inherently integrating the hypernym oracle into the clustering algorithm, we overcome two main problems of unsupervised clustering approaches relying on the distributional similarity of terms to induce concept hierarchies. First, by only clustering two terms if they have a hypernym in common we make sure that the cluster produced in this way is actually reasonable. Second, by labeling the clusters with the corresponding hypernym we overcome the labeling problem shared by all unsupervised approaches. We present results of a comparison of our approach with Caraballo’s method, assessing the quality of the automatically learned ontologies by comparing them to a handcrafted taxonomy for the tourism domain using the similarity measures of Maedche et al. Further, we also present a human evaluation of the concept hierarchy produced by our guided algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis

We present a novel approach to the automatic acquisition of taxonomies or concept hierarchies from a text corpus. The approach is based on Formal Concept Analysis (FCA), a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. We follow Harris’ distributional hypothesis and model the context of a certain term as a vector representing syn...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Clustering Concept Hierarchies from Text

Abstract We present a novel approach to learning taxonomies or concept hierarchies from text. The approach is based on Formal Concept Analysis, a method mainly used for the analysis of data, i.e. for investigating and processing explicitly given information. Our approach is based on the distributional hypothesis, i.e. that nouns or terms are similar to the extent to which they share contexts. F...

متن کامل

Application of modified balanced iterative reducing and clustering using hierarchies algorithm in parceling of brain performance using fMRI data

Introduction: Clustering of human brain is a very useful tool for diagnosis, treatment, and tracking of brain tumors. There are several methods in this category in order to do this. In this study, modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) was introduced for brain activation clustering. This algorithm has an appropriate speed and good scalability in dealing ...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005